NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates

Jiang, Fengqing; Xu, Zhangchen; Niu, Luyao; Lin, Bill Yuchen; Poovendran, Radha (February 2025, The 39th Annual AAAI Conference on Artificial Intelligence)

Large language models (LLMs) are expected to follow in- structions from users and engage in conversations. Tech- niques to enhance LLMs’ instruction-following capabilities typically fine-tune them using data structured according to a predefined chat template. Although chat templates are shown to be effective in optimizing LLM performance, their impact on safety alignment of LLMs has been less understood, which is crucial for deploying LLMs safely at scale. In this paper, we investigate how chat templates affect safety alignment of LLMs. We identify a common vulnerability, named ChatBug, that is introduced by chat templates. Our key insight to identify ChatBug is that the chat templates provide a rigid format that need to be followed by LLMs, but not by users. Hence, a malicious user may not necessar- ily follow the chat template when prompting LLMs. Instead, malicious users could leverage their knowledge of the chat template and accordingly craft their prompts to bypass safety alignments of LLMs. We study two attacks to exploit the ChatBug vulnerability. Additionally, we demonstrate that the success of multiple existing attacks can be attributed to the ChatBug vulnerability. We show that a malicious user can exploit the ChatBug vulnerability of eight state-of-the- art (SOTA) LLMs and effectively elicit unintended responses from these models. Moreover, we show that ChatBug can be exploited by existing jailbreak attacks to enhance their at- tack success rates. We investigate potential countermeasures to ChatBug. Our results show that while adversarial train- ing effectively mitigates the ChatBug vulnerability, the vic- tim model incurs significant performance degradation. These results highlight the trade-off between safety alignment and helpfulness. Developing new methods for instruction tuning to balance this trade-off is an open and critical direction for future research.
more » « less
Free, publicly-accessible full text available February 25, 2026
Who is Responsible? Explaining Safety Violations in Multi-Agent Cyber-Physical Systems

https://doi.org/10.1109/ICAA64256.2024.00012

Niu, Luyao; Zhang, Hongchao; Sahabandu, Dinuka; Ramasubramanian, Bhaskar; Clark, Andrew; Poovendran, Radha (October 2024, IEEE)

Full Text Available
Risk-Aware Distributed Multi-Agent Reinforcement Learning

https://doi.org/10.23919/ACC60939.2024.10644829

Al_Maruf, Abdullah; Niu, Luyao; Ramasubramanian, Bhaskar; Clark, Andrew; Poovendran, Radha (July 2024, IEEE)

Full Text Available
Fault Tolerant Neural Control Barrier Functions for Robotic Systems under Sensor Faults and Attacks

https://doi.org/10.1109/ICRA57147.2024.10610491

Zhang, Hongchao; Niu, Luyao; Clark, Andrew; Poovendran, Radha (May 2024, IEEE)

Full Text Available
POSTER: Game of Trojans: Adaptive Adversaries Against Output-based Trojaned-Model Detectors

https://doi.org/10.1145/3634737.3659430

Sahabandu, Dinuka; Xu, Xiaojun; Rajabi, Arezoo; Niu, Luyao; Ramasubramanian, Bhaskar; Li, Bo; Poovendran, Radha (July 2024, ACM)

Full Text Available
Necessary and Sufficient Conditions for Satisfying Linear Temporal Logic Constraints using Control Barrier Certificates

Niu, Luyao; Clark, Andrew; Poovendran, Radha (December 2023, IEEE Conference on Decision and Control (CDC))

Full Text Available
A Compositional Resilience Index for Computationally Efficient Safety Analysis of Interconnected Systems

Niu, Luyao; Al Maruf, Abdullah; Clark, Andrew; Mertoguno, J.S.; Poovendran, Radha (December 2023, IEEE Conference on Decision and Control (CDC))

Full Text Available
Fed-Game: A Game-Theoretic Defense Against Backdoor Attacks in Federated Learning

Jia, Jinyuan; Yuan, Zhuowen; Sahabandu, Dinuka; Niu, Luyao; Rajabi, Arezoo; Ramasubramanian, Bhaskar; Li; Poovendran, Radha (December 2023, In 37th Conference on Neural Information Processing Systems (NeurIPS))

Full Text Available
A Timing-Based Framework for Designing Resilient Cyber-Physical Systems under Safety Constraint

https://doi.org/10.1145/3594638

Al Maruf, Abdullah; Niu, Luyao; Clark, Andrew; Mertoguno, J. Sukarno; Poovendran, Radha (July 2023, ACM Transactions on Cyber-Physical Systems)

Cyber-physical systems (CPS) are required to satisfy safety constraints in various application domains such as robotics, industrial manufacturing systems, and power systems. Faults and cyber attacks have been shown to cause safety violations, which can damage the system and endanger human lives. Resilient architectures have been proposed to ensure safety of CPS under such faults and attacks via methodologies including redundancy and restarting from safe operating conditions. The existing resilient architectures for CPS utilize different mechanisms to guarantee safety, and currently, there is no common framework to compare them. Moreover, the analysis and design undertaken for CPS employing one architecture is not readily extendable to another. In this article, we propose a timing-based framework for CPS employing various resilient architectures and develop a common methodology for safety analysis and computation of control policies and design parameters. Using the insight that the cyber subsystem operates in one out of a finite number of statuses, we first develop a hybrid system model that captures CPS adopting any of these architectures. Based on the hybrid system, we formulate the problem of joint computation of control policies and associated timing parameters for CPS to satisfy a given safety constraint and derive sufficient conditions for the solution. Utilizing the derived conditions, we provide an algorithm to compute control policies and timing parameters relevant to the employed architecture. We also note that our solution can be applied to a wide class of CPS with polynomial dynamics and also allows incorporation of new architectures. We verify our proposed framework by performing a case study on adaptive cruise control of vehicles.
more » « less
Full Text Available
Robust Satisfaction of Metric Interval Temporal Logic Objectives in Adversarial Environments

https://doi.org/10.3390/g14020030

Niu, Luyao; Ramasubramanian, Bhaskar; Clark, Andrew; Poovendran, Radha (April 2023, Games)

This paper studies the synthesis of controllers for cyber-physical systems (CPSs) that are required to carry out complex time-sensitive tasks in the presence of an adversary. The time-sensitive task is specified as a formula in the metric interval temporal logic (MITL). CPSs that operate in adversarial environments have typically been abstracted as stochastic games (SGs); however, because traditional SG models do not incorporate a notion of time, they cannot be used in a setting where the objective is time-sensitive. To address this, we introduce durational stochastic games (DSGs). DSGs generalize SGs to incorporate a notion of time and model the adversary’s abilities to tamper with the control input (actuator attack) and manipulate the timing information that is perceived by the CPS (timing attack). We define notions of spatial, temporal, and spatio-temporal robustness to quantify the amounts by which system trajectories under the synthesized policy can be perturbed in space and time without affecting satisfaction of the MITL objective. In the case of an actuator attack, we design computational procedures to synthesize controllers that will satisfy the MITL task along with a guarantee of its robustness. In the presence of a timing attack, we relax the robustness constraint to develop a value iteration-based procedure to compute the CPS policy as a finite-state controller to maximize the probability of satisfying the MITL task. A numerical evaluation of our approach is presented on a signalized traffic network to illustrate our results.
more » « less
Full Text Available

« Prev Next »

Search for: All records